chore: router ci improvement#2234
Conversation
- improved redis usage from the ci - renamed TestNatsEvents to TestFlakyNatsEvents to be retried on failure - commented out suspected faulty test
WalkthroughCI adds three Redis cluster services, a readiness/initialization step, and dynamic ACL/config application. Several flaky tests are moved into new top-level test functions or commented out. gotestsum invocation and version updated, and test rerun support added to Makefiles. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Router image scan passed✅ No security vulnerabilities found in image: |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/router-ci.yaml (1)
321-331: Regex excludes many legitimate tests; tighten to “not starting with TestFlaky”.
^Test[^(Flaky)]will skip any test whose first char after “Test” is F/l/a/k/y/“(”. Replace with an alternation that matches everything exceptTestFlaky….- run: make test test_params="-run '^Test[^(Flaky)]' --timeout=5m -p 1 --parallel 10" test_target="${{ matrix.test_target }}" + run: make test test_params="-run '^(Test[^F].*|TestF[^l].*|TestFl[^a].*|TestFla[^k].*|TestFlak[^y].*)$' --timeout=5m -p 1 --parallel 10" test_target="${{ matrix.test_target }}"If you’d prefer simpler ops, an alternative is to keep
-run '^Test'here and ensure all flaky suites are skipped viat.Skipbehind an env flag (e.g.,SKIP_FLAKY=1) and run them explicitly in the flaky step with that flag unset.
🧹 Nitpick comments (3)
router-tests/ratelimit_test.go (1)
703-707: Don’t comment out tests; skip with rationale and TODO.Convert the commented case into a
t.Runthat callst.Skipf(...)explaining the cluster default-user password caveat. Keeps the test discoverable and documents intent.- // Seems like not working as expected ? (if password is configured for the cluster default user) - //{ - // name: "should successfully use auth from later url if no auth in first urls", - // clusterUrlSlice: []string{"redis://localhost:7003", "rediss://localhost:7001", "rediss://cosmo:test@localhost:7002"}, - //}, + { + name: "should successfully use auth from later url if no auth in first urls (skipped)", + clusterUrlSlice: []string{"redis://localhost:7003", "rediss://localhost:7001", "rediss://cosmo:test@localhost:7002"}, + },And in the loop:
- for _, tt := range tests { + for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { t.Parallel() + if strings.Contains(tt.name, "(skipped)") { + t.Skipf("Disabled under CI when cluster default user requires auth; follow-up needed to assert client URL auth merge semantics independently of server auth policy.") + }.github/workflows/router-ci.yaml (2)
192-233: Redis Cluster services: good coverage; consider pinning image digests.The 3-node cluster setup looks solid. To reduce supply-chain drift in CI, pin images to immutable digests (e.g.,
bitnamilegacy/redis-cluster:7.2@sha256:...).
309-319: Inline credentials in CI script (CKV_SECRET_4).Even though this is test infra, embedding
cosmo:testin URIs triggers secret scanners and isn’t future‑proof. Use env vars in the step and reference them.- - name: Configure Redis Authentication & ACL - run: | + - name: Configure Redis Authentication & ACL + env: + REDIS_TEST_USER: cosmo + REDIS_TEST_PASSWORD: test + run: | docker ps -a # Set a password for each master node - for cid in $(docker ps --format "{{.ID}} {{.Image}}" | grep "redis-cluster" | awk '{print $1}'); do + for cid in $(docker ps --format "{{.ID}} {{.Image}}" | grep "redis-cluster" | awk '{print $1}'); do echo "Configuring ACLs in container $cid" - docker exec "$cid" redis-cli -p 6379 ACL SETUSER cosmo on ">test" "~*" "+@all" + docker exec "$cid" redis-cli -p 6379 ACL SETUSER "$REDIS_TEST_USER" on ">$REDIS_TEST_PASSWORD" "~*" "+@all" docker exec "$cid" redis-cli -p 6379 ping done - cid=$(docker ps --format "{{.ID}} {{.Image}}" | grep "redis:7" | awk '{print $1}') + cid=$(docker ps --format "{{.ID}} {{.Image}}" | grep "redis:7" | awk '{print $1}') # Sanity checks docker exec "$cid" redis-cli -p 6379 ping - docker exec "$cid" redis-cli -u "redis://cosmo:test@redis-0:6379" ping - docker exec "$cid" redis-cli -u "redis://cosmo:test@redis-0:6379" cluster nodes + docker exec "$cid" redis-cli -u "redis://${REDIS_TEST_USER}:${REDIS_TEST_PASSWORD}@redis-0:6379" ping + docker exec "$cid" redis-cli -u "redis://${REDIS_TEST_USER}:${REDIS_TEST_PASSWORD}@redis-0:6379" cluster nodes
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/router-ci.yaml(3 hunks)router-tests/events/nats_events_test.go(1 hunks)router-tests/ratelimit_test.go(1 hunks)
🧰 Additional context used
🪛 Checkov (3.2.334)
.github/workflows/router-ci.yaml
[medium] 317-318: Basic Auth Credentials
(CKV_SECRET_4)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: build-router
- GitHub Check: build_push_image
- GitHub Check: build_push_image (nonroot)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: image_scan (nonroot)
- GitHub Check: image_scan
- GitHub Check: integration_test (./events)
- GitHub Check: build_test
- GitHub Check: integration_test (./telemetry)
- GitHub Check: build_test
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (1)
router-tests/events/nats_events_test.go (1)
65-66: Rename to TestFlaky… is fine; verify CI selection regex won’t over-exclude.The rename cleanly gates this suite behind “flaky” runs. However, the CI regex used to exclude flakies (
^Test[^(Flaky)]) will also skip any tests starting with TestF…, Testl…, Testa…, etc. Please update the workflow regex as suggested in router-ci.yaml to avoid unintentionally dropping non‑flaky tests.
- using gotestsum to retry flaky tests
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
router-tests/Makefile (1)
4-4: Sanity‑check--rerun-failsbehavior in your CI version of gotestsum.Recent reports show
--rerun-failscan sometimes yield a zero exit code despite failures, depending on version/output parsing. If you see anomalies, disable it or add--rerun-fails-reportto surface retries. (github.com)If helpful, I can add a CI‑guard to toggle reruns (e.g.,
RERUN_FLAKY?=1and$(if $(RERUN_FLAKY),--rerun-fails=$(rerun_fails_count))).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
router-tests/Makefile(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: Analyze (go)
- using gotestsum to retry flaky tests
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/router-ci.yaml (1)
159-161: Runner label typo blocks the job from scheduling.Use ubuntu-latest; ubuntu-latest-l is invalid.
- runs-on: ubuntu-latest-l + runs-on: ubuntu-latest
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/router-ci.yaml(3 hunks)router-tests/Makefile(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- router-tests/Makefile
🧰 Additional context used
🪛 Checkov (3.2.334)
.github/workflows/router-ci.yaml
[medium] 317-318: Basic Auth Credentials
(CKV_SECRET_4)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-router
- GitHub Check: integration_test (./telemetry)
- GitHub Check: build_test
- GitHub Check: build_push_image
- GitHub Check: integration_test (./events)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: build_push_image (nonroot)
- GitHub Check: image_scan
- GitHub Check: image_scan (nonroot)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (2)
.github/workflows/router-ci.yaml (2)
256-294: Harden the “Wait for Redis Cluster” probe (strict mode, fail-fast, pick the creator by intent).Same as the earlier suggestion; current script may silently pass/fail and pick an arbitrary node.
- - name: Wait for Redis Cluster - run: | - echo "[CHECK] Waiting for Redis Cluster to become healthy..." - cluster_containers=$(docker ps --quiet --filter "ancestor=bitnamilegacy/redis-cluster:7.2") - - success=0 - for i in {1..30}; do - if [ $i -eq 1 ]; then - echo "[INIT] Forcing cluster creation..." - # pick one container as the "creator" - creator=$(echo $cluster_containers | awk '{print $1}') - # run the cluster create command inside it - docker exec "$creator" redis-cli --cluster create redis-0:6379 redis-1:6379 redis-2:6379 --cluster-replicas 0 --cluster-yes || true - fi - - for cid in $cluster_containers; do - docker exec "$cid" redis-cli -p 6379 cluster info - if docker exec "$cid" redis-cli -p 6379 cluster info 2>/dev/null | grep -q "cluster_state:ok"; then - echo "[SUCCESS] Redis Cluster is ready (reported by $cid)" - success=1 - break 2 - fi - done - - echo "[WAITING] Cluster not ready yet (attempt $i)..." - sleep 2 - done - - if [ $success -eq 0 ]; then - echo "[ERROR] Redis Cluster did not become healthy in time" - for cid in $cluster_containers; do - echo "--- Cluster info for $cid ---" - docker exec "$cid" redis-cli -p 6379 cluster info || true - docker exec "$cid" redis-cli -p 6379 cluster nodes || true - echo "--- Logs for $cid ---" - docker logs "$cid" | tail -n 100 - done - exit 1 - fi + - name: Wait for Redis Cluster + run: | + set -euo pipefail + echo "[CHECK] Waiting for Redis Cluster to become healthy..." + cluster_containers=$(docker ps --format '{{.ID}} {{.Names}} {{.Image}}' | awk '$3 ~ /bitnamilegacy\/redis-cluster:7\.2/ {print $1 "|" $2}') + if [ -z "${cluster_containers}" ]; then + echo "[ERROR] No redis-cluster containers found"; docker ps -a; exit 1 + fi + # Prefer the intended creator (REDIS_CLUSTER_CREATOR=yes) via inspect; fallback to container named *redis-2*; else first + creator="" + while IFS= read -r pair; do + cid="${pair%%|*}"; name="${pair##*|}" + if docker inspect -f '{{range .Config.Env}}{{println .}}{{end}}' "$cid" | grep -q '^REDIS_CLUSTER_CREATOR=yes$'; then creator="$cid"; fi + [ -n "${first:-}" ] || first="$cid" + if [ -z "$creator" ] && echo "$name" | grep -q 'redis-2'; then creator="$cid"; fi + done <<< "$cluster_containers" + creator="${creator:-$first}" + echo "[INIT] Forcing cluster creation from $creator ..." + docker exec "$creator" redis-cli --cluster create redis-0:6379 redis-1:6379 redis-2:6379 --cluster-replicas 0 --cluster-yes || true + + for i in {1..30}; do + for pair in $cluster_containers; do + cid="${pair%%|*}" + if docker exec "$cid" redis-cli -p 6379 cluster info 2>/dev/null | grep -q "cluster_state:ok"; then + echo "[SUCCESS] Redis Cluster is ready (reported by $cid)"; exit 0 + fi + done + echo "[WAITING] Cluster not ready yet (attempt $i)..."; sleep 2 + done + echo "[ERROR] Redis Cluster did not become healthy in time" + for pair in $cluster_containers; do + cid="${pair%%|*}"; echo "--- Cluster info for $cid ---" + docker exec "$cid" redis-cli -p 6379 cluster info || true + docker exec "$cid" redis-cli -p 6379 cluster nodes || true + echo "--- Logs for $cid ---"; docker logs "$cid" | tail -n 200 || true + done + exit 1
192-233: Keep port publishing — tests use localhost:7001/7002/7003Search confirms tests reference the exposed host ports; do not remove port publishing.
Locations:
- router-tests/testenv/testenv.go:399 — redisClusterHost = "redis://localhost:7001"
- router-tests/ratelimit_test.go:686, 701, 710, 755 — clusterUrlSlice references localhost:7001/7002/7003
- router-tests/automatic_persisted_queries_test.go:414 — clusterUrls include localhost:7001/7002
- using gotestsum to retry flaky tests
- using gotestsum to retry flaky tests - removed non-official custom gh actions to reduce supply-chain attack vector
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/router-ci.yaml (1)
192-233: Redis cluster services: good direction; pin images and prefer deterministic creator selection.
- Pin images by digest for determinism/supply-chain.
- Minor: since only redis-2 is the creator, add a comment to keep this invariant or derive creator in the wait script (preferred).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/router-ci.yaml(3 hunks)
🧰 Additional context used
🪛 Checkov (3.2.334)
.github/workflows/router-ci.yaml
[medium] 317-318: Basic Auth Credentials
(CKV_SECRET_4)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: build-router
- GitHub Check: build_push_image
- GitHub Check: image_scan (nonroot)
- GitHub Check: image_scan
- GitHub Check: build_push_image (nonroot)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: integration_test (./events)
- GitHub Check: build_test
- GitHub Check: Analyze (go)
🔇 Additional comments (3)
.github/workflows/router-ci.yaml (3)
256-294: Harden the wait probe; fail-fast, pick the intended creator, keep diagnostics.Same concern as the prior review: add strict shell, verify containers found, prefer the creator node, and preserve logs on failure.
- - name: Wait for Redis Cluster - run: | - echo "[CHECK] Waiting for Redis Cluster to become healthy..." - cluster_containers=$(docker ps --quiet --filter "ancestor=bitnamilegacy/redis-cluster:7.2") - - success=0 - for i in {1..30}; do - if [ $i -eq 1 ]; then - echo "[INIT] Forcing cluster creation..." - # pick one container as the "creator" - creator=$(echo $cluster_containers | awk '{print $1}') - # run the cluster create command inside it - docker exec "$creator" redis-cli --cluster create redis-0:6379 redis-1:6379 redis-2:6379 --cluster-replicas 0 --cluster-yes || true - fi - - for cid in $cluster_containers; do - docker exec "$cid" redis-cli -p 6379 cluster info - if docker exec "$cid" redis-cli -p 6379 cluster info 2>/dev/null | grep -q "cluster_state:ok"; then - echo "[SUCCESS] Redis Cluster is ready (reported by $cid)" - success=1 - break 2 - fi - done - - echo "[WAITING] Cluster not ready yet (attempt $i)..." - sleep 2 - done - - if [ $success -eq 0 ]; then - echo "[ERROR] Redis Cluster did not become healthy in time" - for cid in $cluster_containers; do - echo "--- Cluster info for $cid ---" - docker exec "$cid" redis-cli -p 6379 cluster info || true - docker exec "$cid" redis-cli -p 6379 cluster nodes || true - echo "--- Logs for $cid ---" - docker logs "$cid" | tail -n 100 - done - exit 1 - fi + - name: Wait for Redis Cluster + run: | + set -euo pipefail + echo "[CHECK] Waiting for Redis Cluster to become healthy..." + cluster_containers=$(docker ps --format '{{.ID}} {{.Names}} {{.Image}}' | awk '$3 ~ /bitnamilegacy\/redis-cluster:7\.2/ {print $1 "|" $2}') + if [ -z "${cluster_containers}" ]; then + echo "[ERROR] No redis-cluster containers found"; docker ps -a; exit 1 + fi + creator="" + first="" + while IFS= read -r line; do + cid="${line%%|*}"; name="${line##*|}" + [ -n "${first}" ] || first="$cid" + if echo "$name" | grep -q 'redis-2'; then creator="$cid"; fi + done <<< "$cluster_containers" + creator="${creator:-$first}" + echo "[INIT] Forcing cluster creation from $creator ..." + timeout 5s docker exec "$creator" redis-cli --cluster create redis-0:6379 redis-1:6379 redis-2:6379 --cluster-replicas 0 --cluster-yes || true + + for i in {1..30}; do + for pair in $cluster_containers; do + cid="${pair%%|*}" + if timeout 3s docker exec "$cid" redis-cli -p 6379 cluster info 2>/dev/null | grep -q "cluster_state:ok"; then + echo "[SUCCESS] Redis Cluster is ready (reported by $cid)"; exit 0 + fi + done + echo "[WAITING] Cluster not ready yet (attempt $i)..."; sleep 2 + done + echo "[ERROR] Redis Cluster did not become healthy in time" + for pair in $cluster_containers; do + cid="${pair%%|*}"; echo "--- Cluster info for $cid ---" + docker exec "$cid" redis-cli -p 6379 cluster info || true + docker exec "$cid" redis-cli -p 6379 cluster nodes || true + echo "--- Logs for $cid ---"; docker logs "$cid" | tail -n 200 || true + done + exit 1
309-318: Don’t embed credentials; use env/secrets and redis-cli --user/-a. Fix CKV_SECRET_4.This leaks creds into logs and triggers Checkov. Use masked env and avoid URIs.
- # Set a password for each master node - for cid in $(docker ps --format "{{.ID}} {{.Image}}" | grep "redis-cluster" | awk '{print $1}'); do - echo "Configuring ACLs in container $cid" - docker exec "$cid" redis-cli -p 6379 ACL SETUSER cosmo on ">test" "~*" "+@all" - docker exec "$cid" redis-cli -p 6379 ping - done - cid=$(docker ps --format "{{.ID}} {{.Image}}" | grep "redis:7" | awk '{print $1}') - # Sanity checks - docker exec "$cid" redis-cli -p 6379 ping - docker exec "$cid" redis-cli -u "redis://cosmo:test@redis-0:6379" ping - docker exec "$cid" redis-cli -u "redis://cosmo:test@redis-0:6379" cluster nodes + # Configure ACLs (secrets masked by Actions) + ACL_USER="${ACL_USER:-cosmo}" + ACL_PASS="${CI_REDIS_PASS:-test}" + echo "::add-mask::$ACL_PASS" + for cid in $(docker ps --format '{{.ID}} {{.Image}}' | awk '$2 ~ /redis-cluster/ {print $1}'); do + echo "Configuring ACLs in container $cid" + docker exec "$cid" redis-cli -p 6379 ACL SETUSER "$ACL_USER" on ">$ACL_PASS" "~*" "+@all" + docker exec "$cid" redis-cli -p 6379 ping >/dev/null + done + cid=$(docker ps --format '{{.ID}} {{.Image}}' | awk '$2=="redis:7"{print $1}' | head -n1) + # Sanity checks (no URI with creds in logs) + docker exec "$cid" redis-cli -h redis-0 -p 6379 --user "$ACL_USER" -a "$ACL_PASS" ping + docker exec "$cid" redis-cli -h redis-0 -p 6379 --user "$ACL_USER" -a "$ACL_PASS" cluster nodesOptionally add to the step:
env:
CI_REDIS_PASS: ${{ secrets.CI_REDIS_PASS }}
ACL_USER: cosmo
321-321: Retries confirmed — no change neededrouter-tests/Makefile line 15 invokes gotestsum with --rerun-fails="$(test_retry_count)", so the workflow's test invocation will trigger retries.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/router-ci.yaml (1)
192-233: Redis Cluster services: consider avoiding host port exposure and pin image by digest.
- You can rely on service DNS (redis-0/1/2:6379) from the job container and drop host port mappings (7001-7003) to reduce collision risk.
- Pin bitnamilegacy/redis-cluster:7.2 by digest to harden supply chain and avoid accidental tag drift.
- If DockerHub rate limiting is a concern, consider mirroring to GHCR or using an org-managed cache.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/router-ci.yaml(3 hunks)
🧰 Additional context used
🪛 Checkov (3.2.334)
.github/workflows/router-ci.yaml
[medium] 317-318: Basic Auth Credentials
(CKV_SECRET_4)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: build-router
- GitHub Check: build_test
- GitHub Check: integration_test (./events)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: build_push_image (nonroot)
- GitHub Check: image_scan
- GitHub Check: image_scan (nonroot)
- GitHub Check: build_test
- GitHub Check: build_push_image
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (5)
.github/workflows/router-ci.yaml (5)
256-294: Harden the Redis Cluster wait/probe; fail fast and prefer the creator node.Add strict shell options, error out when no containers found, and prefer the intended creator (redis-2). Keep diagnostics on failure.
- - name: Wait for Redis Cluster - run: | - echo "[CHECK] Waiting for Redis Cluster to become healthy..." - cluster_containers=$(docker ps --quiet --filter "ancestor=bitnamilegacy/redis-cluster:7.2") - - success=0 - for i in {1..30}; do - if [ $i -eq 1 ]; then - echo "[INIT] Forcing cluster creation..." - # pick one container as the "creator" - creator=$(echo $cluster_containers | awk '{print $1}') - # run the cluster create command inside it - docker exec "$creator" redis-cli --cluster create redis-0:6379 redis-1:6379 redis-2:6379 --cluster-replicas 0 --cluster-yes || true - fi - - for cid in $cluster_containers; do - docker exec "$cid" redis-cli -p 6379 cluster info - if docker exec "$cid" redis-cli -p 6379 cluster info 2>/dev/null | grep -q "cluster_state:ok"; then - echo "[SUCCESS] Redis Cluster is ready (reported by $cid)" - success=1 - break 2 - fi - done - - echo "[WAITING] Cluster not ready yet (attempt $i)..." - sleep 2 - done - - if [ $success -eq 0 ]; then - echo "[ERROR] Redis Cluster did not become healthy in time" - for cid in $cluster_containers; do - echo "--- Cluster info for $cid ---" - docker exec "$cid" redis-cli -p 6379 cluster info || true - docker exec "$cid" redis-cli -p 6379 cluster nodes || true - echo "--- Logs for $cid ---" - docker logs "$cid" | tail -n 100 - done - exit 1 - fi + - name: Wait for Redis Cluster + run: | + set -euo pipefail + echo "[CHECK] Waiting for Redis Cluster to become healthy..." + cluster_containers=$(docker ps --format '{{.ID}} {{.Names}} {{.Image}}' | awk '$3 ~ /bitnamilegacy\/redis-cluster:7\.2/ {print $1 "|" $2}') + if [ -z "${cluster_containers}" ]; then + echo "[ERROR] No redis-cluster containers found"; docker ps -a; exit 1 + fi + # Prefer the creator node if present; otherwise pick the first + creator="" + while IFS= read -r line; do + cid="${line%%|*}"; name="${line##*|}" + if echo "$name" | grep -q "redis-2"; then creator="$cid"; fi + [ -n "${first:-}" ] || first="$cid" + done <<< "$cluster_containers" + creator="${creator:-$first}" + echo "[INIT] Forcing cluster creation from $creator ..." + docker exec "$creator" redis-cli --cluster create redis-0:6379 redis-1:6379 redis-2:6379 --cluster-replicas 0 --cluster-yes || true + + for i in {1..30}; do + for pair in $cluster_containers; do + cid="${pair%%|*}" + if docker exec "$cid" redis-cli -p 6379 cluster info 2>/dev/null | grep -q "cluster_state:ok"; then + echo "[SUCCESS] Redis Cluster is ready (reported by $cid)"; exit 0 + fi + done + echo "[WAITING] Cluster not ready yet (attempt $i)..."; sleep 2 + done + echo "[ERROR] Redis Cluster did not become healthy in time" + for pair in $cluster_containers; do + cid="${pair%%|*}"; echo "--- Cluster info for $cid ---" + docker exec "$cid" redis-cli -p 6379 cluster info || true + docker exec "$cid" redis-cli -p 6379 cluster nodes || true + echo "--- Logs for $cid ---"; docker logs "$cid" | tail -n 200 || true + done + exit 1
306-318: Don’t embed credentials; pass via env and use redis-cli --user/-a.Avoid leaking creds into logs and satisfy CKV_SECRET_4.
# Set a password for each master node - for cid in $(docker ps --format "{{.ID}} {{.Image}}" | grep "redis-cluster" | awk '{print $1}'); do - echo "Configuring ACLs in container $cid" - docker exec "$cid" redis-cli -p 6379 ACL SETUSER cosmo on ">test" "~*" "+@all" - docker exec "$cid" redis-cli -p 6379 ping + ACL_USER="${ACL_USER:-cosmo}" + ACL_PASS="${ACL_PASS:-test}" + export ACL_USER ACL_PASS + set +x + for cid in $(docker ps --format '{{.ID}} {{.Image}}' | awk '$2=="bitnamilegacy/redis-cluster:7.2"{print $1}'); do + echo "Configuring ACLs in container $cid" + docker exec "$cid" redis-cli -p 6379 ACL SETUSER "$ACL_USER" on ">$ACL_PASS" "~*" "+@all" + docker exec "$cid" redis-cli -p 6379 --user "$ACL_USER" -a "$ACL_PASS" ping done - cid=$(docker ps --format "{{.ID}} {{.Image}}" | grep "redis:7" | awk '{print $1}') + cid=$(docker ps --format '{{.ID}} {{.Image}}' | awk '$2=="redis:7"{print $1}' | head -n1) # Sanity checks - docker exec "$cid" redis-cli -p 6379 ping - docker exec "$cid" redis-cli -u "redis://cosmo:test@redis-0:6379" ping - docker exec "$cid" redis-cli -u "redis://cosmo:test@redis-0:6379" cluster nodes + docker exec "$cid" redis-cli -h redis-0 -p 6379 --user "$ACL_USER" -a "$ACL_PASS" ping + docker exec "$cid" redis-cli -h redis-0 -p 6379 --user "$ACL_USER" -a "$ACL_PASS" cluster nodesNote: if you can use repo secrets, prefer setting ACL_USER/ACL_PASS via secrets and omit the default.
192-233: Claim vs implementation: host ports mapped contradict “container-scoped” usage.PR summary says “container-scoped port usage,” but 7001-7003 are host-published here.
Can tests resolve redis-0/1/2 by service DNS and drop the port publishes?
159-159: Verify runner label “ubuntu-latest-l”.Is this a custom self-hosted label? If not, GitHub’s standard label is “ubuntu-latest”.
321-321: LGTM on test invocation.The tighter timeout and explicit parallelism look fine.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
router-tests/ratelimit_test.go (1)
703-707: Consider opening a tracking issue for the commented-out test.The TODO note explains that the test was a false positive and requires investigation. To ensure this task is not forgotten, consider opening a tracking issue (if one doesn't already exist) to investigate and fix the underlying functionality.
Do you want me to open a new issue to track this task?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
router-tests/ratelimit_test.go(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: build-router
- GitHub Check: Analyze (go)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: build_push_image
- GitHub Check: image_scan
- GitHub Check: build_push_image (nonroot)
- GitHub Check: image_scan (nonroot)
- GitHub Check: integration_test (./events)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: build_test
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
router-tests/events/nats_events_test.go (2)
1614-1724: Consider documenting the flakiness root cause.The test implementation looks correct and comprehensive. However, isolating flaky tests without addressing the root cause can lead to accumulating technical debt, as noted in past review comments.
Consider adding a comment explaining:
- Why this test is flaky (e.g., timing-sensitive SSE event delivery, NATS message ordering)
- Whether the flakiness is environmental or indicates a potential product issue
- A reference to track investigation of the root cause (e.g., issue number)
This documentation will help future contributors understand whether the flakiness is acceptable or needs fixing.
Example:
func TestFlakyNatsEvents(t *testing.T) { t.Parallel() + // TODO(ENG-XXXX): This test is flaky due to timing sensitivity in SSE event delivery + // when multiple events are published rapidly. The test occasionally misses events + // or receives them out of order. This is likely a test infrastructure issue rather + // than a product bug, but should be investigated. t.Run("subscribe sse with filter", func(t *testing.T) {
1614-1616: Establish pattern for additional flaky tests.Good separation of flaky tests into a dedicated top-level function. If additional tests need to be marked as flaky in the future, consider adding them as sub-tests within
TestFlakyNatsEventsrather than creating multipleTestFlaky*functions. This keeps all flaky tests in one place and makes CI configuration simpler.Based on past review comments, which mentioned potential flakiness in Kafka and Redis tests.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
router-tests/events/nats_events_test.go(1 hunks)router-tests/ratelimit_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- router-tests/ratelimit_test.go
🧰 Additional context used
🧬 Code graph analysis (1)
router-tests/events/nats_events_test.go (2)
router-tests/testenv/testenv.go (4)
Run(107-124)Config(286-343)ConfigWithEdfsNatsJSONTemplate(87-87)Environment(1731-1767)router-tests/testenv/utils.go (1)
AwaitChannelWithT(10-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: build-router
- GitHub Check: build_push_image
- GitHub Check: build_push_image (nonroot)
- GitHub Check: image_scan (nonroot)
- GitHub Check: image_scan
- GitHub Check: integration_test (./events)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: build_test
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
router-tests/events/nats_events_test.go (1)
1614-1724: Confirm removal of "subscribe sse with filter" from TestNatsEvents; only one occurrence remains in TestFlakyNatsEvents.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
router-tests/batch_test.go (1)
750-803: Clarify test duplication and flaky subset.This new
TestFlakyBatchfunction appears to duplicate logic from the existing "Verify all root span attributes for batch requests" test inTestBatch(lines 565-631), but with significantly reduced attribute coverage (7 attributes vs 24).Given the past review discussion about isolating specific flaky tests, please clarify:
- Is this the minimal flaky subset, or are you still determining which specific assertions are flaky?
- Why were these particular 7 attributes selected for the flaky test?
- Should the original comprehensive test in
TestBatchremain, or will it be removed once flakiness is resolved?The reduced coverage could mask issues with the untested attributes, so understanding the rationale will help ensure test quality.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
router-tests/batch_test.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
router-tests/batch_test.go (4)
router-tests/testenv/testenv.go (4)
Run(107-124)Config(286-343)Environment(1731-1767)GraphQLRequest(1907-1915)router/pkg/trace/tracetest/tracetest.go (1)
NewInMemoryExporter(11-17)router/pkg/config/config.go (2)
Config(998-1072)BatchingConfig(879-884)router/pkg/otel/attributes.go (7)
WgRouterConfigVersion(21-21)WgRouterRootSpan(32-32)WgIsBatchingOperation(49-49)WgBatchingOperationsCount(50-50)WgOperationHash(14-14)WgClientName(18-18)WgClientVersion(19-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-router
- GitHub Check: image_scan
- GitHub Check: build_push_image (nonroot)
- GitHub Check: build_push_image
- GitHub Check: image_scan (nonroot)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: build_test
- GitHub Check: integration_test (./events)
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
router-tests/batch_test.go (1)
791-791: Dynamic config version check differs from original test.Line 791 uses
xEnv.RouterConfigVersionMain()for dynamic version checking, while the original test at line 610 usessa.HasValue(otel.WgRouterConfigVersion)to check for attribute existence.This difference could explain flakiness if the config version value changes across test runs. However, the original comprehensive test also passes, suggesting this might not be the root cause.
Consider verifying whether the config version value is stable in your CI environment.
Summary by CodeRabbit
Checklist